AfterWork Data Science: Data Visualisation with Python Project

1. Defining the Question

a) Specifying the Data Analysis Question

Analyse data for Hilton International Hotels to provide strategy recommendations that will lead to revenue growth.

b) Defining the Metric for Success

The solutions to the following questions will help us answer our research question:

  • When is the best time of year to book a hotel room?
  • When is the optimal length of stay in order to get the best daily rate?
  • How will you know if a hotel was likely to receive a disproportionately high number of special requests?

c) Understanding the context

You are a Data Scientist working for Hilton International Hotels and need to help the management decide on the strategies that will lead to revenue growth.

You have a dataset containing information on the various chain of hotels, including when customers made the booking, length of stay, the number of adults, children, or babies, and the number of available parking spaces, among other things.

d) Recording the Experimental Design

We will first perform data exploration, data cleaning, and analysis to come with appropriate recommendations.

e) Data Relevance

The data provided was relevant to the research question since it answers the research questions

2. Reading the Data

library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.3.6      ✔ purrr   0.3.5 
## ✔ tibble  3.1.8      ✔ dplyr   1.0.10
## ✔ tidyr   1.2.1      ✔ stringr 1.4.1 
## ✔ readr   2.1.3      ✔ forcats 0.5.2 
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
# install.packages("reshape")
library(reshape)
## 
## Attaching package: 'reshape'
## 
## The following object is masked from 'package:dplyr':
## 
##     rename
## 
## The following objects are masked from 'package:tidyr':
## 
##     expand, smiths
# install.packages("knitr")
# install.packages("rmarkdown")
library(knitr)
library(rmarkdown)
# Load the data below
# --- 
# Dataset url = https://bit.ly/2WvQbhJ
# --- 
# YOUR CODE GOES BELOW
# 
hotel_booking_df <- read.csv("hotel_bookings.csv", header = TRUE, na.strings=c("NULL",NA))
# Checking the first 5 rows of data
# ---
# YOUR CODE GOES BELOW
# 
head(hotel_booking_df, 5)
# Checking the last 5 rows of data
# ---
# YOUR CODE GOES BELOW
# 
tail(hotel_booking_df, 5)
# Sample 10 rows of data
# ---
# YOUR CODE GOES BELOW
# 
sample_n(hotel_booking_df, 10)
# Checking number of rows and columns
# ---
# YOUR CODE GOES BELOW
#  
dim(hotel_booking_df)
## [1] 119390     32
# Checking datatypes
# ---
# YOUR CODE GOES BELOW
# 
glimpse(hotel_booking_df)
## Rows: 119,390
## Columns: 32
## $ hotel                          <chr> "Resort Hotel", "Resort Hotel", "Resort…
## $ is_canceled                    <int> 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, …
## $ lead_time                      <int> 342, 737, 7, 13, 14, 14, 0, 9, 85, 75, …
## $ arrival_date_year              <int> 2015, 2015, 2015, 2015, 2015, 2015, 201…
## $ arrival_date_month             <chr> "July", "July", "July", "July", "July",…
## $ arrival_date_week_number       <int> 27, 27, 27, 27, 27, 27, 27, 27, 27, 27,…
## $ arrival_date_day_of_month      <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
## $ stays_in_weekend_nights        <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ stays_in_week_nights           <int> 0, 0, 1, 1, 2, 2, 2, 2, 3, 3, 4, 4, 4, …
## $ adults                         <int> 2, 2, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, …
## $ children                       <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ babies                         <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ meal                           <chr> "BB", "BB", "BB", "BB", "BB", "BB", "BB…
## $ country                        <chr> "PRT", "PRT", "GBR", "GBR", "GBR", "GBR…
## $ market_segment                 <chr> "Direct", "Direct", "Direct", "Corporat…
## $ distribution_channel           <chr> "Direct", "Direct", "Direct", "Corporat…
## $ is_repeated_guest              <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ previous_cancellations         <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ previous_bookings_not_canceled <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ reserved_room_type             <chr> "C", "C", "A", "A", "A", "A", "C", "C",…
## $ assigned_room_type             <chr> "C", "C", "C", "A", "A", "A", "C", "C",…
## $ booking_changes                <int> 3, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ deposit_type                   <chr> "No Deposit", "No Deposit", "No Deposit…
## $ agent                          <int> NA, NA, NA, 304, 240, 240, NA, 303, 240…
## $ company                        <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ days_in_waiting_list           <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ customer_type                  <chr> "Transient", "Transient", "Transient", …
## $ adr                            <dbl> 0.00, 0.00, 75.00, 75.00, 98.00, 98.00,…
## $ required_car_parking_spaces    <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ total_of_special_requests      <int> 0, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 0, 3, …
## $ reservation_status             <chr> "Check-Out", "Check-Out", "Check-Out", …
## $ reservation_status_date        <chr> "2015-07-01", "2015-07-01", "2015-07-02…

Record your general observations below:

Observation 1: There are about 120 thousand records in the data set with 32 fields for every record. Each field seem to have the correct data type Observation 2: Some of the Values in the dataset are either null or missing

3. External Data Source Validation

The data is originally from the article Hotel Booking Demand Datasets, by Nuno Antonio, Ana Almeida, and Luis Nunes for Data in Brief, Volume 22, February 2019.

4. Data Preparation

Performing Data Cleaning

# Checking datatypes and missing entries of all the variables
# ---
# YOUR CODE GOES BELOW
# 
str(hotel_booking_df)
## 'data.frame':    119390 obs. of  32 variables:
##  $ hotel                         : chr  "Resort Hotel" "Resort Hotel" "Resort Hotel" "Resort Hotel" ...
##  $ is_canceled                   : int  0 0 0 0 0 0 0 0 1 1 ...
##  $ lead_time                     : int  342 737 7 13 14 14 0 9 85 75 ...
##  $ arrival_date_year             : int  2015 2015 2015 2015 2015 2015 2015 2015 2015 2015 ...
##  $ arrival_date_month            : chr  "July" "July" "July" "July" ...
##  $ arrival_date_week_number      : int  27 27 27 27 27 27 27 27 27 27 ...
##  $ arrival_date_day_of_month     : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ stays_in_weekend_nights       : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ stays_in_week_nights          : int  0 0 1 1 2 2 2 2 3 3 ...
##  $ adults                        : int  2 2 1 1 2 2 2 2 2 2 ...
##  $ children                      : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ babies                        : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ meal                          : chr  "BB" "BB" "BB" "BB" ...
##  $ country                       : chr  "PRT" "PRT" "GBR" "GBR" ...
##  $ market_segment                : chr  "Direct" "Direct" "Direct" "Corporate" ...
##  $ distribution_channel          : chr  "Direct" "Direct" "Direct" "Corporate" ...
##  $ is_repeated_guest             : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ previous_cancellations        : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ previous_bookings_not_canceled: int  0 0 0 0 0 0 0 0 0 0 ...
##  $ reserved_room_type            : chr  "C" "C" "A" "A" ...
##  $ assigned_room_type            : chr  "C" "C" "C" "A" ...
##  $ booking_changes               : int  3 4 0 0 0 0 0 0 0 0 ...
##  $ deposit_type                  : chr  "No Deposit" "No Deposit" "No Deposit" "No Deposit" ...
##  $ agent                         : int  NA NA NA 304 240 240 NA 303 240 15 ...
##  $ company                       : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ days_in_waiting_list          : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ customer_type                 : chr  "Transient" "Transient" "Transient" "Transient" ...
##  $ adr                           : num  0 0 75 75 98 ...
##  $ required_car_parking_spaces   : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ total_of_special_requests     : int  0 0 0 0 1 1 0 1 1 0 ...
##  $ reservation_status            : chr  "Check-Out" "Check-Out" "Check-Out" "Check-Out" ...
##  $ reservation_status_date       : chr  "2015-07-01" "2015-07-01" "2015-07-02" "2015-07-02" ...
sum(is.na(hotel_booking_df))
## [1] 129425

We observe the following from our dataset:

Observation 1: Correct datatypes are assigned for the different fields Observation 2: There are 129425 missing values in our data set

# Checking how many duplicate rows are there in the data
# ---
# YOUR CODE GOES BELOW
# 
# hotel_booking_df[duplicated(hotel_booking_df), ]
sum(duplicated(hotel_booking_df))
## [1] 31994

We choose to keep the duplicates because we don’t have a unique identifier to actually proof that we have duplicates.

# Checking if any of the columns are all null
# ---
# YOUR CODE GOES BELOW
# 
which(colSums(is.na(hotel_booking_df))  == ncol(hotel_booking_df))
## named integer(0)

We observe the following from our dataset:

Observation 1: There are no columns in which all the data is null

# Checking if any of the rows are all null
# ---
# YOUR CODE GOES BELOW

which(rowSums(is.na(hotel_booking_df)) == nrow(hotel_booking_df))
## integer(0)

We observe the following from our dataset:

Observation 1: there is no row in which all the data is null

# Checking the correlation of the features through the use of 
# visualizations the correlation using heatmap
# ---
# YOUR CODE GOES BELOW
# 

head(hotel_booking_df)
# Select the desired columns
hotel_booking_df2 <- hotel_booking_df %>%
  na.omit() %>%
  select(is_canceled, stays_in_weekend_nights, stays_in_week_nights, adults, children, babies, is_repeated_guest, previous_cancellations, previous_bookings_not_canceled, days_in_waiting_list, total_of_special_requests)
hotel_booking_df2
# Create a correlation matrix
corr_matrix <- cor(hotel_booking_df2, method="s")
## Warning in cor(hotel_booking_df2, method = "s"): the standard deviation is zero
head(corr_matrix)
##                         is_canceled stays_in_weekend_nights
## is_canceled              1.00000000              -0.1438928
## stays_in_weekend_nights -0.14389278               1.0000000
## stays_in_week_nights    -0.08137791               0.7794933
## adults                  -0.04922699              -0.0664910
## children                 0.05446082              -0.1683479
## babies                           NA                      NA
##                         stays_in_week_nights      adults    children babies
## is_canceled                      -0.08137791 -0.04922699  0.05446082     NA
## stays_in_weekend_nights           0.77949325 -0.06649100 -0.16834790     NA
## stays_in_week_nights              1.00000000 -0.15183455 -0.12611588     NA
## adults                           -0.15183455  1.00000000  0.20245466     NA
## children                         -0.12611588  0.20245466  1.00000000     NA
## babies                                    NA          NA          NA      1
##                         is_repeated_guest previous_cancellations
## is_canceled                   -0.01183921            -0.01983730
## stays_in_weekend_nights       -0.13333459            -0.02758733
## stays_in_week_nights          -0.26416244            -0.08427287
## adults                        -0.07395879            -0.05488996
## children                       0.17358125            -0.01147280
## babies                                 NA                     NA
##                         previous_bookings_not_canceled days_in_waiting_list
## is_canceled                               -0.005786814                   NA
## stays_in_weekend_nights                   -0.129795621                   NA
## stays_in_week_nights                      -0.233541874                   NA
## adults                                    -0.133680418                   NA
## children                                   0.190626508                   NA
## babies                                              NA                   NA
##                         total_of_special_requests
## is_canceled                           -0.11646585
## stays_in_weekend_nights               -0.12033640
## stays_in_week_nights                  -0.24581095
## adults                                 0.01722921
## children                               0.09985491
## babies                                         NA
# Create a table with long form
corr_df <- melt(corr_matrix)
## Warning in type.convert.default(X[[i]], ...): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(X[[i]], ...): 'as.is' should be specified by the
## caller; using TRUE
corr_df
# Plot the heatmap
ggplot(corr_df, aes(X1, X2, fill = value)) +
  geom_tile(color = "black") +
  geom_text(
    aes(label = round(value, 2)),
    color = "pink"
  ) +
  coord_fixed() +
  labs(
    fill="Pearson Correlation"
  ) +
  scale_fill_gradient2(
    low = "blue",
    high = "red",
    mid = "white",
    limit = c(-1,1)
  ) +
  theme(
    axis.title.x = element_blank(),
    axis.title.y = element_blank()
  )
## Warning: Removed 38 rows containing missing values (geom_text).

We observe the following from our dataset:

There is weak correlation (both positive and negative) between the various fields with the exception of “previous_bookings_not_canceled” and “is_repeated_guest” where we have a strong positive correlation

# Dropping company column because it has alot of missing values 
# and we won't need to answer any of our questions
# ---
# YOUR CODE GOES BELOW
# 
hotel_booking_df = select(hotel_booking_df, -company)
head(hotel_booking_df)

From the data variable description we see that the Distribution Channel category that tells us about Booking distribution.

The term “TA” means “Travel Agents” The term “TO” means “Tour Operators” This allows us to fill the missing values in the agents column with TO

# We replace the mising values i.e. for TO
# ---
# YOUR GOES BELOW
#

hotel_booking_df$agent[is.na(hotel_booking_df$agent)] <- "TO"
head(hotel_booking_df)
# We drop rows where there is no adult, baby and child as 
# these records won't help us.
# ---
# YOUR GOES BELOW
#

hotel_booking_df <- subset(hotel_booking_df, adults!=0 | children!=0 | babies!=0)
head(hotel_booking_df)
# We replace missing children values with rounded mean value
# ---
# Hint i.e. use round()
# ---
# YOUR GOES BELOW
#
hotel_booking_df$children[is.na(hotel_booking_df$children)] <- round(mean(hotel_booking_df$children, na.rm = TRUE), 0)
head(hotel_booking_df)
# Checking for missing values in the dataframe
# ---
# YOUR GOES BELOW
#
sum(is.na(hotel_booking_df))
## [1] 478
# Converting the datatypes of the following columns from float to integer
# i.e. children, company, agent
# ---
# YOUR GOES BELOW
# 
# hotel_booking_df$children <- lapply(hotel_booking_df$children, as.numeric)
# as.integer(hotel_booking_df$children)
head(hotel_booking_df)

5. Solution Implementation

5.a) Questions

# 1. How many bookings were cancelled?
# ---
# Visualisation: Barplot

hotel_booking_df %>%
  group_by(reservation_status) %>%
  summarise(is_canceled = sum(is_canceled))
ggplot(hotel_booking_df, aes(x= reservation_status, y= is_canceled))+
  geom_col(stat = "identity", width =  0.5, fill= "blue") +
  labs(
    title = "Reservation Status",
    x= "Status",
    y= "Number"
  ) +
  theme(
    plot.title = element_text(hjust = 0.5, size=16, family="Ariel Black", color="purple")
  )
## Warning: Ignoring unknown parameters: stat
## Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): font family not
## found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

# 2. What was the booking ratio between resort hotel and city hotel?
# ---
# Barplot of booking ratio between resort hotel and city hotel

resort_city_hotel <- hotel_booking_df %>%
  count(hotel)
resort_city_hotel
ggplot(data=resort_city_hotel, aes(x="", y=n, fill=hotel)) +
  geom_bar(
    stat="identity", width=0.5, color = "white"
  ) +
  coord_polar("y") +
  geom_text(
    aes(label = paste0(round(n / sum(n) * 100, 2), "%")), position = position_stack(vjust = 0.5)) +
  scale_fill_brewer(palette="Set2")+
  labs(
    fill = "Hotel",
    title = "Booking Between Resort Hotel and City Hotel"
  )+
  theme_void()

# 3. What was the percentage of booking for each year?
# ---
# 

year_booking <- hotel_booking_df %>%
  count(arrival_date_year)
year_booking
ggplot(data=year_booking, aes(x="", y=n, fill=factor(arrival_date_year))) +
  geom_bar(
    stat="identity", width=0.5, color = "cyan"
  ) +
  coord_polar("y") +
    geom_text(
    aes(label = paste0(round(n / sum(n) * 100, 2), "%")), position = position_stack(vjust = 0.5)
  ) +
  scale_fill_brewer(palette="Set1") +
  labs(
    fill = "Years",
    title = "Booking for each year") +
  theme_void()

# 4. Which were the most busiest months for hotels?
# ---
# 
busy_months <- hotel_booking_df %>%
  count(arrival_date_month)
busy_months
ggplot(busy_months, aes(x= arrival_date_month, y= n))+
  geom_col(stat = "identity", width =  0.8, fill= "magenta") +
  labs(
    title = "Bookings per Month",
    x= "Month",
    y= "Number of bookings"
  ) +
  theme(
    plot.title = element_text(hjust = 0.5, size=16, family="Arial", color="blue")
  )
## Warning: Ignoring unknown parameters: stat
## Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): font family not
## found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

# 5. From which top 3 countries did most guests come from?
# --- 
# YOUR GOES BELOW
#

hotel_booking_df
countries <- hotel_booking_df %>%
  group_by(country) %>%
  summarise(guests = sum(adults)+sum(children)+sum(babies)) %>%
  # arrange(desc(guests)) %>%
  top_n(3)
## Selecting by guests
countries
ggplot(countries, aes(x= country, y= guests))+
  geom_col(stat = "identity", width =  0.8, fill= "blue") +
  labs(
    title = "Top 3 Countries per Guests",
    x= "Countries",
    y= "Number of Guests"
  ) +
  theme(
    plot.title = element_text(hjust = 0.5, size=16, family="Arial", color="black")
  )+
  coord_flip()
## Warning: Ignoring unknown parameters: stat
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

# 6.a) How long do most people stay in hotels?
#   b) By city and resort? Separate the data by hotel
# ---
# 

stay_length <- hotel_booking_df %>%
  group_by(hotel) %>%
  summarise(total_stay_days = mean(stays_in_weekend_nights)+mean(stays_in_week_nights))
stay_length
ggplot(data = stay_length, aes(x= hotel, y= total_stay_days))+
  geom_col(stat = "identity", width =  0.8, fill= "dark grey") +
  labs(
    title = "Days Stayed per Hotel",
    x= "Hotel Type",
    y= "Number of Days Stayed"
  ) +
  theme(
    plot.title = element_text(hjust = 0, size=16, family="Arial", color="blue")
  )+
  theme_classic()
## Warning: Ignoring unknown parameters: stat

# 7. Which was the most booked accommodation type (Single, Couple, Family)?
# ---
#
accommodation <- hotel_booking_df %>%
  count(reserved_room_type)
accommodation
ggplot(data = accommodation, aes(x= reserved_room_type, y= n))+
  geom_col(stat = "identity", width =  0.8, fill= "brown") +
  labs(
    title = "Bookings per Accomodation Type",
    x= "Room Type",
    y= "Number of Bookings"
  ) +
  theme(
    plot.title = element_text(hjust = 0.5, size=16, family="Arial", color="blue")
  )+
coord_flip()
## Warning: Ignoring unknown parameters: stat
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

5.b) Recommendations

From the above analysis, below are our recommendations: A lot of cancellation do happen accross the hotels while only a few are a no show. Reasons for cancellation should be investigated so as to minimize on lost revenue

Twice as much guests prefer city hotels to resort hotels. Management should invest in city hotels to ensure they have sufficnent capacity

Booking are not consistent year on year and management should be prepared for flactuations in number of guests each year

August is the busisest month and management must be prepared to handle the high number of guests in busy months to tap in revenue Incentives can be provided during the less busy months to encourage bookings

Most guests come from Portugal, Great Britain and France

6. Challenging your Solution

In this step, we review our solution and implement approaches that could potentially provide a better outcome. In our case, we could propose the following question that wasn’t answered in our solution because it couldn’t have greatly contributed to our recommendation.

# When should hotels provide special offers?
# ---
# YOUR GOES BELOW
# 
not_busy_months <- hotel_booking_df %>%
  count(arrival_date_month)%>%
  top_n(-3)
## Selecting by n
not_busy_months
ggplot(not_busy_months, aes(x= arrival_date_month, y= n))+
  geom_col(stat = "identity", width =  0.5, fill= "#3B7A57") +
  labs(
    title = "Least Bookings per Month",
    x= "Month",
    y= "Number of bookings"
  ) +
  theme(
    plot.title = element_text(hjust = 0.5, size=16, family="Arial", color="#4B3621")
  )
## Warning: Ignoring unknown parameters: stat
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

Our observations: November, December, and January are the least busy months. This means most of the facilities in the hotel remain underutilized and management can seek to provide offers that incentivize guests to make bookings during these months

How does this observation tie to our solution? This observation helps management with an opportunity to further grow their revenue by attracting guests even during the low seasons

7. Follow up questions

During this step, you rethink and propose other ways that you can improve your solution.

a). Did we have the right data? Yes, we had the right data which was from a credible source b). Do we need other data to answer our question? provided data was sufficient to answer our research questions, however, other data that could provide more insights such as reasons for the high number of cancellations would have been of great help c). Did we have the right question? Research questions formulated were adequate to provide sound recommendations